Applying Finite State Morphology to Conversion Between Roman and Perso-Arabic Writing Systems
نویسندگان
چکیده
This paper presents a method for converting back and forth between the Perso-Arabic and a Romanized writing systems for Persian. Given a word in one writing system, we use finite state transducers to generate morphological analysis for the word that is subsequently used to regenerate the orthography of the word in the other writing system. The system has been implemented in XFST and LEXC.
منابع مشابه
Implementing Urdu Grammar as Open Source Software
Urdu is a challenging language because of, first, its Perso-Arabic script, second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia and third, its pragmatically neutral constituent order (SOV Subject Object Verb). Today, the state of art technology to write grammars (morphology + syntax) is to use specialpurpose ...
متن کاملSyllable Based Transcription of English Words into Perso-Arabic Writing System
This paper presents a rule-based method for transcription of English words into the PersoArabic orthography. The method relies on the phonetic representation of English words such as the CMU pronunciation dictionary. Some of the challenging problems are the context-based vowel representation in the Perso-Arabic writing system and the mismatch between the syllabic structures of English and Persi...
متن کاملSangam: A Perso-Arabic to Indic Script Machine Transliteration Model
Indian sub-continent is one of those unique parts of the world where single languages are written in different scripts. This is the case for example with Punjabi, written in Indian East Punjab in Gurmukhi script (a Left to Right script based on Devnagri) and in Pakistani West Punjab, it is written in Shahmukhi (a Right to Left script based on Perso-Arabic). This is also the case with other lang...
متن کاملGenerating an Arabic Full-form Lexicon for Bidirectional Morphology Lookup
We describe the generation of an Arabic full-form lexicon and its conversion into a two-level Finite State Transducer (FST) for morphology analysis and generation. The implementation of morphological lookup is based on a representation of the relevant data in the form of a FST, for which generic implementations exist that facilitate the integration into larger software systems for natural langu...
متن کاملAnalysis of Noori Nasta'leeq for major Pakistani languages
Nasta’leeq is a bidirectional, diagonal, non-monotonic, cursive, highly context-sensitive and very complex writing style for languages like Urdu, Punjabi, Balochi and Kashmiri. Each is written in a variant of the Perso-Arabic script. The style is characterized by well-formed orthographic rules that are passed down from generation to generation of calligraphers and old manuscripts. It is present...
متن کامل